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ABSTRACT 

~  I  ‘  , .  i  „  >  r 

Titterington  •44£B3>"'proposed  recursive  methods  for  dealing  with 
incomplete  data.  The  present  paper  concentrates  on  versions  of  these  for 
multiparameter  problems  involving  missing  data.  Theorems  are  outlined  from 
which  asymptotic  properties  of  the  recursive  procedures  can  be  established  and 
versions  of  the  recursions  are  written  down  for  problems  in  which  the  missing 
data  are  missing  at  random.  After  illustration  with  exponential  family 
models,  the  case  of  multivariate  Normal  data  is  considered  in  detail. 

Numerical  comparisons  of  the  various  methods  are  obtained  using  bivariate 
Normal  data. 
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SIGNIFICANCE  AND  EXPLANATION 


This  paper  is  a  development  of  a  previous  report  (12376)  by  the  first 
author*  The  introductory  comments  to  that  paper  are  highly  relevant  here. 

\> 

Whereas  the  previous  paper  discussed  incomplete  data  in  general,  the  present 
one  restricts  attention  to  the  problem  of  missing  values.  Typically,  each 
experimental  unit  should  have  records  of  the  values  of  several  characteristics 
associated  with  it.  Statistical  analysis  is  made  difficult  if  one  or  more  of 
those  values  are  missing  on  some  units.  ^ 

To  combat  the  heavy  analysis  required  for  a  •proper*  analysis  of  the 
data,  comparatively  simple  recursive  procedures  are  outlined  in  which  the  data 
are  incorporated  sequentially  into  the  estimation  scheme.  Some  comments  are 
made  about  theoretical  properties  and  special  emphasis  is  laid  on  the  case  of 
data  from  multivariate  Normal  distributions. 
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RECURSIVE  ESTIMATION  PROCEDURES  FOR  MISSING-DATA  PROBLEMS 
D.  M.  Titterington*  and  J-M.  Jiang** 

X.  INTRODUCTION 

Suppose  yi,y2....  form  a  sequence  of  independent  observations  from  a  population  with 
parametric  probability  density  function  g(y|8),  where  8  is  a  vector  of  s 
parameters*  Let  Sty, 8),  the  vector  of  scores  corresponding  to  a  single  observation,  be 
defined  by 

Sj(y,|)  -  j|-  log  g(y I  8),  J  -  , 

and  let  1(8)  be  the  Fisher  Information  matrix  corresponding  to  one  observation.  It  is 

assumed  that  all  these  quantities  exist  and  that  the  ’usual*  regularity  conditions  hold  in 

order  that  differentiation  with  respect  to  8  and  integration  over  y  may  be 

interchanged.  Consider  the  recursion 

Vi  ”  2k  ♦  +  1)'1K8k)"1s(yk+i,8k),  k  -  0,1,...  (1) 

Under  certain  extra  mild  conditions  referred  to  in  Titterington  (1982),  as  It  ♦  ■>, 

/k  (8k  -  fT)  *  HIO.K^)*1) 

in  distribution,  where  8_  is  the  true  value  of  8. 

-T  - 

If  we  are  dealing  with  incomplete  data,  however,  the  asymptotically  efficient 
stochastic  approximation  (1)  may  not  be  easy  to  apply,  mainly  because  1(8)  can  be 
difficult  to  evaluate  and,  if  necessary,  invert.  The  former  problem  arises  even  with 
simple,  one-parameter  models,  such  as  the  estimation  of  the  mixing  weight  in  a  mixture  of 
two  known  densities,  and  both  problems  appear  in,  for  instance,  parameter  estimation  for  a 
mixture  of  two  univariate  Normal  densities.  These  illustrations  appear  in  Titterington 
( 1982)  where  recursions  alternative  to  ( 1 )  are  also  suggested.  One  natural  proposal  is  to 
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replace  (k  +  1)1(8^)  by  the  total  sample  Information  up  to  this  point.  This  might  still 
require  awkward  matrix  inversion.  Another  suggestion  is  to  use,  instead,  the  recursion 

Vi  -  ?k  ♦  (k  *  *  -  °'1'—  (2) 

where  I^(  0)  is  the  Fisher  Information  matrix  corresponding  to  a  complete  observation. 

Ic<6)  is  usually  easy  to  evaluate  and  in  many  applications,  is  easy  to  invert.  Again  an 
asymptotic  result  can  be  drawn  upon  and  the  recursion  has  strong  associations  with  the  EM 
algorithm  (Dempster,  Laird  and  Rubin,  1977)  for  maximum  likelihood  estimationi  see 
Titterington  (1982). 

The  objective  of  this  paper  is  to  develop  these  recursions  for  missing-data  problems, 
with  emphasis  on  multivariate  Normal  data,  and  to  assess  their  performance  on  soderately 
large  data-sets,  one  aspect  of  importance  is  the  dependence  of  the  results  on  the  order  in 

A 

which  the  data  are  incorporated.  Given  a  set  of  n  data  points  it  is  intended  that  8^, 
as  given  by  (1)  or  8^,  from  (2),  be  used  as  the  parameter  estimate,  one  pass  having  been 
made  through  the  data.  It  is  hoped  that  such  a  procedure  gives  estimators  which  perform 
well  and  yet  is  inexpensive  in  computer  time  in  comparison  with  the  iterative  numerical 
procedures,  such  as  the  EM  algorithm  ltaelf,  traditionally  used  with  incomplete  data. 

These  methods  should  prove  to  be  very  useful  with  large  incomplete  data-sets,  such  as 
sample  surveys  with  nonresponse. 
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2.  THEORETICAL  ASPECTS 

The  inportant  theoretical  questions  concern  the  asysptotic  properties  of  the  sequence 
of  estiaators  generated  by  the  stochastic  approxiaations  (1)  and  (2).  We  extract  the 
following  theorea  froa  the  work  of  Walk  (1977),  giving  the  essential  features  of  the  result 
but  not  detailing  all  the  aany  regularity  conditions. 

Theorea  1.  Consider  the  recursion 

®k+1  -  -®k  -  <*♦  o'VvW  * 

where  *<Yk+1lY1 '••••¥**  "  2*  *l»ost  surely,  and,  for  each  k,  *(¥*¥*>  <  •»  where  the 

expected  value  is  based  on  the  true  distribution.  Suppose 

f  ( 6)  -  f<e  >  ♦  Aiene  -  e  )  +  o|e  -  ej 

•  -  -  -T  -I  “  -I  •  *T 

and  that  the  eigenvalues  of  A^  *  A(8f)  are  all  greater  than  Then,  given  certain 
other  regularity  conditions,  as  k  ♦  •, 

&  <ev  “  ®J  *  «<0,B_)  * 

in  distribution,  whsrs  Btj  is  the  uniqus  solution  of 

(Ar  -  5  V*T  *  ’  2  V  -  ■V  ,3> 

In  this  equation,  I  denotes  the  s  *  s  identity  aetrix  and  M_  ”  lia  cov(V  ). 

k*- 

In  our  applications,  for  f(8k>  +  Y^i  *•  have 

-<*<Vs<iwV 

where  G(8)  -  l-1(8)  in  (1)  and  G(8)  -  l"1 ( 8)  in  (2). 

“*  “  “  C  “ 


S(y,8k)  -  B  S(y,0fc>  ♦  ts(y,8k)  -  B  My,^)} 

B  S(y.8k)  -  B  S(y.8T)  -  !($,)(  8^  -  -  -K^H^  -  8T>  . 


In  the  theorea,  therefore,  we  have 


A(  8)  -  G(  8)1(8) 


-  G(  8  )I(  8  )G(  6  ) 

T  **T  —T  —T 


-3' 


Por  recursion  (1),  with  G(0)  -  1(8)  a(8)  «  (so  that  the  eigenvalue  condition 

is  certainly  satisfied),  «  KB^)-1  and  therefore  -  KJj)  \  giving  the  result 
stated  in  Section  1. 

When  Kj,  is  symmetric,  aquation  (3)  can  be  solved  explicitly  in  terms  of  the 
eigenvalues  and  eigenvectors  of  A^,  giving  the  solution  obtained  by  sacks  (1958,  p.  399) 
and  Fabian  (1968).  In  (2),  however,  with 

A(8)  -  I  ( 8)-1I( 8)  , 

-  C  “  “ 

symmetry  of  A(8)  is  not  guaranteed  unless  I^( 8)  and  1(8)  are  both  diagonal.  Equation 
(3)  does,  however,  give  a  linear  equation  from  which  may  be  determined,  in  principle. 

With  recursion  (2)  the  eigenvalue  condition  may  come  into  play.  If  X*  is  the 
minimum  eigenvalue  of  A^,  we  require  X*  >  -j.  Otherwise  '/\i- consistency  does  not  follow 
for  {0^} .  Provided,  however,  X*  >  ■j  0  >  0,  at  is  symmetric  and  there  is  sufficient 
regularity, 

-  V  ■»  N(O,B(0,8t))  ,  (5) 

in  distribution  as  k  *  «•,  where  B„  ”  B(0,0  )  satisfies 

8  -T 

(A*  - 1  te.,B8  *  VS  -  1  *  s  • 

This  result  comes  from  Fabian  (1968).  Although  the  following  result  has  not  been 
tracked  down  it  is  reasonable  to  suppose  that,  if  AT  is  nonsymmetric,  (5)  holds  with  B- 
satisfying 

(At  -  i  8I,)B6  ♦  BfJ(A*  -  {  *.)  *  1%  • 

From  (4)  we  may  interpret  the  eigenvalues  of  A(6)  as  giving  the  amount  of 
information  about  the  parameters  available  in  an  incomplete  observation  relative  to  a 
complete  one.  Were  X*  *  0  it  would  mean  that  not  all  the  parameters  are  identifiable 
from  the  incomplete  data  and  neither  (1)  nor  (2)  would  be  usable.  Otherwise  the  simple 
recursion  (2)  does  give  consistent  estimates,  if  not  asymptotically  efficient  ones. 

One-parameter  problems  were  discussed  in  Titterington  (1982)  and  the  multiparameter 
theorem  above  could  be  applied  to  the  Normal  mixture  problem  described  in  that  paper.  Here 
we  concentrate  on  multivariate  data  with  missing  values. 
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3.  APPLICATION  TO  MISSING-DATA  PROBLEMS 


Tha  problem  will  be  formulated,  aa  in  Rubin  (1976)  and  Little  (1978),  with  the  help  of 
miaaing- value  indicators.  Suppose  each  observation  is  r- dimensional  and  that  d^  is  an 
indicator  vector  for  observation  i  such  that  d^  has  r- component a,  zeros  to  denote 
missing  values  and  ones  elsewhere.  Let  z^  denote  the  set  of  observed  values,  where  the 
symbol  *v'  is  inserted  for  a  component  which  is  missing.  Then  the  overall  observed 
quantity  for  observation  i  is 

*i  "  (£i*3i)  * 

A  typical  complete  observation  would  be 

(x^l)  , 

in  which  1  is  a  vector  of  ones  and  has  no  ava. 

Introduce  the  density  functions  g(s,d|8)  and  ?(x,1|8)  and  make  the  following 
assumptions. 

(i)  g(z,d|8]  -  91(«!fl,)g2(5ie2). 

(ii)  f(x,1|6J  -  f1(x|81)g2(1|82). 

(iii)  8  -  <  , §2 >  where  8^  and  $2  are  distinct  sets  of  parameters. 

If  z  is  now  interpreted  as  representing  just  the  non-missing  components,  it  is  also 
assumed  that 

g.tzIS  )  -  /  f  <x|8  )dx  , 

X(x> 

where  X(s)  is  the  set  of  x  which  can  be  regarded  as  completions  of  z. 

Under  these  assumptions  we  may  ignore  the  missing-data  process  when  making  inferences 
about  8^»  see  Rubin  (1976).  As  a  result  of  (i)  we  may  write  the  score  vector  and 
information  matrix  as 

Stjf, 8)  -  /  S,  (z,  6^ ) 

\  §2LM2> 

and 
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lime 


1(8) 


(V-V  °  )  . 

\  o  i2te2>/ 


so  that 

(I  <8  )-1S  <r,6  ) 

' 

I2<-2)1§2(-'-2) 

This  separation  means  that,  in  the  recursions,  •  and  02  can  be  updated  more  or 
less  separately  and  indeed  updating  6^  reduces  simply  to  the  estimation  of  multinomial 
parameters  using  relative  frequencies*  The  separation  is  not  quite  consists  because 

W  “  £  92<-,-2 ,X1 {-1 l-)  ' 

d 


where  I^tS^ld)  denotes  the  Information  matrix  obtained  from  missing-data  pattern  d. 
Typically  1^ (01 |d)  will  be  calculable,  as  it  is  associated  with  a  complete-data  problem, 
of  lower  dimension  than  r.  Thus,  in  this  class  of  problems  it  is  often  possible  to 
calculate  1^(8^)  explicitly,  although  it  does  depend  on  &2.  In  this  respect  recursion 
(1)  should  be  more  feasible  here  than  it  is  for  mixture  problems,  say. 

It  is  clear,  from  Section  2,  that  1^(8^  has  to  be  positive  definite. 

Example.  Exponential  family  models. 

Suppose 

log  f^xIBj)  “  const  +  t(x)*8j  -  *(8^) 

and  define  *  «  B(t(x)|8  ).  Given  complete  data  x.,...,x  ,  yielding  t.,...,t  ,  the 
■*  “  ■  "1  —I  — n  — i  -n 

-  -1  n 

maximum  likelihood  estimator  for  $  is  ^  ■  n  £  t , ,  which  can  be  calculated 

"  i-1 

recursively  from 

ik+i  "  k  +  '*  ♦  ,r,<Sk+i  -  V  '  <6) 

k  »  0,1,..., n  -  1,  with  j>0  »  0. 

Suppose  t* denotes  the  components  of  t  observed  when  the  missing-data  pattern  in 
t  is  d.  Then,  as  was  shown  in  Titterington  (1982),  where  the  relationship  with  a 
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(7) 


k 


recursive  version  of  the  EM  algorithm  was  pointed  out,  recursion  (2)  can  ba  written 

?x*i  *  *  (k  *  ’>_1  <*<£*♦, -  &  • 

Tor  apacial  exponential  family  problems  this  recursion  takas  another  interesting 


form.  Suppose  we 

write 

sT« 

(d)  (d) 

(t  ,t  )  and  suppose  that 

-d 

is  a  cut;  see  Barndorff- 

Nielsen  (1978, 

P* 

50). 

Then 

t*d^  has  linear  regression  on 

t<d> 

( Bar ndor f f-N ielsen , 

1978,  p.  197), 

SO 

that. 

for  some  matrix  H(£), 

«<t<d,|t(d>,*)  -  ♦(d>  -  H($)<t(d>  - 

♦(d)) 

(8) 

Thus,  partitioning  (7),  we  have 

32!  •  Xd>  i,_1h<v(^!  -  Xd)>  J 


This  pattern  will  be  apparent  in  tha  multivariate  Normal  examples  considered  later. 

A  formula  for  H(£)  can  be  obtained  in  terms  of  the  complete- data  information  matrix 

i’  V*1- 

Since  t*d*  ie  a  cut,  it  follows,  as  in  Barndorff-Nielaen  (1978,  p.  128),  that  t*d* 
itself  has  an  exponential  family  distribution.  CSiven  the  above  partitioning,  the 
appropriate  acore  vecter  for  use  in  (1)  or  (2)  is,  therefore. 


^Vi<dVd>  -i(d)>^ 

where  d*  •  *(t*d*>  and  Ic<$^d* )  1  is  the  leading  square  suboatrix  of  lc<^>  '• 
Suppose  that  $*d*  is  q- dimensional  and  that  the  first  q  columns  of  Ic(^)-1  are  given 

I^'d’  ) 

Then  because  of  tha  form  of  recursion  (2),  the  matrix  H(£)  in  (8)  is  given  by 

I<d,d>i  (d),-1  . 

c  c  •* 

This  can  be  written  as 


{ 


-I  (d,d)_1I  (d,d)  , 
c  c 

where  these  two  aatrices  come  frosi  an  appropriate  partitioning  of  Ic ( J) !  see,  for 
instance  Barndorf f-Nielsen  (1978,  p.3). 

To  investigate  possible  consistency  of  recursions  (1)  and  (2),  nonsingularity  of  the 
incomplete- data  information  matrix  for  J,  I(J),  must  be  sought.  It  is  convenient  to  use 
the  notation  of  Hartley  and  Hocking  (1971).  Suppose  t  and  $  are  s-dimensional  and  that 
t* is  q- dimensional,  with  q  <  s.  Then,  for  a  certain  q  x  s  matrix,  Dd,  of  seros 
and  ones. 


DdS» 


.(d> 


-1  T 

Vc'i'  Dd 


Also,  D  .d;  -  I  ,  the  q  *  q  identity  matrix.  In  the  special  partitioning  of  t 
ad  q  “ 

considered  earlier,  the  first  q  columns  of  Dd  make  up  1^.  For  brevity,  let 

»(d)  -  g2(d||2)  , 

the  probability  of  obtaining  inccaipleteness  pattern  d. 

(d) 

Then  if,  whatever  d  is,  the  corresponding  t  is  a  cut. 


I<*)  -  l  «<d)n^ic(il  »)Dd  . 
d 


Suppose,  for  j  •  1,...,s,  denotes  the  sum  of  *(d)  over  all  d  from  which  the 

jth  component  of  t  can  be  calculated.  Then,  for  nonsingularity  of  I(^),  we  require 
■^  >  0,  for  all  j. 

Whether  or  not  (7)  leads  to  /k-consistent  estimators  depends  on  the  eigenvalues  of 
W  X(V-  In  the  very  simple  case  of  the  s-dimensional  multivariate  exponential 
distribution  with  independent  components,  we  require  that  the  probability  of  observation  of 
each  component  be  at  least  . 

The  rest  of  the  paper  will  be  devoted  to  the  multivariate  Normal  distribution.  Since 
this  belongs  to  the  exponential  family  we  already  have  some  insight  into  this  case.  For 
instance,  consistency  at  even  the  weakest  rate  is  possible  only  if  there  is  positive 
probability  of  observing  each  element  in  the  appropriate  sufficient  statistic  t(x).  For 
the  multivariate  Normal  case  there  must  be  positive  probability  of  observing  each  pair  of 
components  of  x. 
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4.  MULTIVARIATE  NORMAL  DATA 


In  spite  of  the  aforementioned  relationship  with  the  exponential  family,  it  is  helpful 

to  look  at  recursions  for  the  more-  familar  parameters,  to  be  denoted  by  (u,  £).  Given 

complete  observations  x, ,...,x  ,  which  are  independent  and  identically  distributed 
—  i  -n 

N(m,£),  the  maximum  likelihood  estimators  of  ( y,  £)  can  be  generated  recursively  by 


+  (k  +  D  <5*,  -  V 


(B) 


£k+1  £k  +  tk  +  11  ^k  +  1  ^ -k+1  "  ”  •V  \) 


(9) 


starting  from  ■  0,  £Q  «  0.  These  recursions  can  be  obtained  from  the  complete-data 
version  of  (1)  for  the  appropriate  parameters  in  the  exponential  family  representation, 
that  is,  recursion  (6),  and  subsequent  re-expression  in  terms  of  (u,EK  The  direct 
version  of  (1)  in  terms  of  (u,E)  gives  (8)  along  with 

£kt,  ■  £k  +  <k  +  -  V<Vl  -  V*  "  V  ’  ,,0> 

which  is  an  apparently  very  minor  variant  of  (9). 

Further  recourse  to  the  notation  of  Hartley  and  Hocking  (1971)  will  help  the 
consideration  of  the  incomplete- data  problem. 

Suppose  d  represents  a  particular  missing-data  pattern  with  q((s)  observed 
components  in  x.  Suppose  ud  (q  *  1)  and  E^fq  x  q)  are  the  corresponding  mean  vector 
and  covariance  matrix  and  let  Dd  be  as  in  Section  3.  Then 

*d  "  Dd*f  £d  *  Dd£  Dd  ’ 

Let  a  be  the  vector,  of  length  V2S(s  +  1),  of  the  elements  of  Z  written  as  a 
vector.  Let  be  the  vector,  of  length  V2  q(q  ♦  1),  containing  the  elements  of  E^ 

and  let  Cd  be  a  matrix  of  zeros  and  ones  such  that 

’  cd-°  * 

T 

CdCd  will  be  the  identity  matrix  of  order  V*2  q(q  +  1). 

The  complete-data  information  matrix,  per  observation,  is 


where  U  is  a  symmetric  matrix  of  order 


where  I  (w) 
c  * 


I  1  and  I  (o)  »  0  \ 
c  • 


^a(s  +  1).  The  element  in  row  (i,j)  and  column  (u,v)  of  U  is 

o,  a,  +0,0.,  i  <  j,  u  <  v  . 
iu  jv  iv  ju  J 

The  incosplete-data  information  matrix  is  also  block-diagonal. 


Hv.v) 


in  which 


I ( U)  -  l  »(d)D*l"1Dd 
d 

-  I  «(d)D^(Ddr  D*)‘id  (in 

d 

and 

1(C)  -  l  »(d)Cd(Cdt»Cd)-1Cd  .  (12) 


The  explicit  forms  for  (11)  and  (12)  allow  recursion  (1)  to  be  considered  without  the 
need  for  numerical  integration.  A  recursion  using  sample  information  could  also  be 
considered  but  it  will  be  more  complicated  since  the  block-diagonal  form  noted  above  will 
not  obtain.  This  is  why  this  method  is  not  included  in  the  simulation  study  in  Section  5. 

Calculation  of  the  score  vector  associated  with  a  single  observation  also  follows 
Bartley  and  Hocking  (1971).  Suppose  g^(xd>  denotes  the  p.d.f.  for  an  observed  value 
corresponding  to  missing-data  pattern  d.  Then 

fp  log  gi(xd)  -  D^tx,  -  Md)  -  SM(xd),  say, 

h  !og  g,(xd)  -  C^Cd0Cd)_1(sd  -  o,)  -  So(xd>,  say. 

In  the  second  equation,  »d  -  c<j!‘  whara  ■  is  the  suitably-ordered  vector  version 


In  the  case  of 


For  recursion  (2)  we  oust  premultiply  this  score  vector  by  I  (u,o)-1. 

c  *  *■ 

e  complete  observation,  for  which  Cd  is  the  identity  matrix  of  order  ^  s ( s  +  1),  we 
obtain  the  recursions  (8)  and  (10).  Sven  for  an  incomplete  observation  we  obtain  an 
appealing  result.  Partition  the  score  vector,  as  above,  as 


Then 


Suppose,  without  loss  of  generality,  the  first  q  components  of  x  are  observed  and  are 
denoted  by  x^.  Write  xT-  (x^  x*)  and  partition  ji  and  £  correspondingly.  Then 


*1 


S12E11<-1 


-V. 


(13) 


It  is  more  helpful  to  express  the  vector  US^x^)  as  a  k  »  k  symmetric  matrix  at 

T 

this  point*  If  Sjj  denotes  the  Matrix  (x^  -  ^)(x^  -  u^)  then  we  obtain 


8n  -  En 


(S11  "  E11)E11E12 


E12En(S11  "  EH,E11E12 


(14) 


Expressions  (13)  and  (14)  can  now  be  used  in  the  second  term  on  the  right  hand  side  of 
recursion  (2). 

One  disadvantage  of  recursion  (2)  is  that  a  different  form  of  (13)  and  (14)  is 
required  for  each  pattern,  d,  of  missing  values.  Expression  (13),  however,  can  be 


written  as 


5,  -  #, 
52  -  h 


(15) 


M  T  "1 

where  J2  -  $j2  +  ri2E11*5l  -  ) ,  the  regression  function  for  x2  given  x^. 

If,  at  stage  k  ♦  1,  we  can  observe  x^tk  ♦  1)  and  have  obtained,  currently, 
and  Efc,  then,  with  these  quantities  substituted  in  the  right  hand  side  of  (15),  we 
obtain  a  regression  imputation  x^tk  +  1)  for  the  Biasing  values  x2(k  ♦  1).  The  use  of 
(13)  in  recursion  (2)  is  then  equivalent  to  (8)  using  the  imputed  ;2(k  +  1).  The  use  of 
(IS)  reaovea  the  need  for  separate  versions  of  recursion  (2)  for  p.  Hope  that  the 
completed  x^+1  can  be  used  in  (10)  instead  of  (14)  are  ill-founded.  If  from  (IS)  is 

used  in  this  way  then,  instead  of  (14),  we  obtain  a  matrix  which  differs  in  the  bottom 
right  hand  block.  This  block  is 


E12E11<811  "  >E1*E12  "  <E22 


E12E11E12>  ‘ 


T  —  1 

Since  Z22  -  £12E11E12  1s  nonnegative  definite,  this  block  is  “smaller  than  it  should 
be",  and  the  resulting  recursion  would  lead  to  "negative  bias"  in  the  estimator  of  E^. 
This  is  because  the  imputed  x^  is  a  conditional  expected  value  and  the  residual  variance 
is  ignored;  see  also  Beale  and  Little  (197S).  Perhaps  a  more  satisfactory  approach  based 
on  recursive  imputation  and  use  of  (8)  and  (10)  is  to  use,  not  (IS),  but  a  pseudo-random 
iaputation 

ij  -  *2  ♦  E12Eil‘5i  -  V  *  h  '  (16) 

where 


e  ~  N(0.  E  -  ET  E_1E  ) 
=2  -'  22  12‘ll‘l2' 


Such  iaputation  and  updating  does  not  correspond  exactly  to  (13)  and  (14)  but, 
conditionally  on  x^  as  well  as  E^ ,  E^2  and  E22,  the  updating  terms  do,  on  average, 
give  (13)  and  (14). 

We  now  illustrate  these  approaches  for  the  case  of  bivariate  data. 
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5.  ILLUSTRATION  FOR  BIVARIATE  NORMAL  DATA 

He  consider  ths  special  case  of  bivariate  Normal  data  in  which  the  first  component  of 
any  observation  is  always  available  but  the  sscond  component  may  be  missing,  with 
probability  (1  -  ■).  This  is  special  in  that  explicit  solutions  are  available  for  the 
maximum  likelihood  estimates  of  the  parameters,  because  of  ths  “nested"  missing-data 
structural  see  Morrison  <1971),  Rubin  (1974)  and  Hocking  and  Marx  (1979).  It  will  be 
possible,  therefore,  to  compare  our  recursive  methods  easily  with  non-recursive  maximum 
likelihood. 

Recursion  (2)  takes  the  following  form.  At  stage  (k  *■  1),  if  5^+^  complete 
then  (8)  and  (10)  are  used  to  update.  If  only  the  first  component  x^k  ♦  1)  is  given, 
then  (8)  is  used  to  update  £*,  using  the  regression  imputation  Xj<k  ♦  1)  for 
x2<k  ♦  1),  where 

ij(k  ♦  1)  -  m2 (k)  +  \«k+1  »  (17) 

in  which  8k  -  *  •“* 

4k*1  "  *1(lt  *  15  "  Vk)  * 

is  updated  according  to 

I^tk  +  1)  -  ft1(k)  +  (k  ♦  if1^, 

I12(k  +  1)  -  ?12<k)  ♦  (k  ♦  ’)_1\Vl 
E22(k  ♦  D  -  ?22<k>  ♦  <k  + 

where  -  {x^lk  +  1)  -  (^(k)}2  -  E^(k). 

In  order  to  illustrate  the  resulting  bias  in  estimating  Ij2,  a  regression-imputation 
recursion  was  tried,  using  (8)  and  (10)  along  with  the  imputation  formula  (17).  Also,  a 
regression-imputation  with  residual  was  studied,  in  which  a  pseudo-random  was  added 

on  to  (17)  before  it  was  plugged  into  (8)  and  (10). 

ek+1  ~N(0'  £22(k)  "  S12(k>2/En(k))  . 

Another  character istic  of  this  example  is  that  recursion  (1)  will  be  easy  to  apply 
because  of  the  explicit  invertabiUty  of  the  incomplete- data  Information  matrix,  l(ji,o). 


diag{l,Y,2,Y.2Y) 


where  Y  m  *(2w  -  1)  For  6*0,  the  linear  equation*  (3)  must  be  solved  for  B. 

It  should  be  stressed  that  we  are  unusually  lucky  in  this  exsmple  to  be  able  to  use 
recursion  (1)  and  the  proper  maximum  likelihood  procedure  so  easily.  In  many  examples  the 
versions  of  recursion  (2)  should  dominate  as  far  as  computation  time  is  concerned.  Apart 
from  this  consideration,  however,  the  following  comparisons  should  give  a  guide  to  relative 
effectiveness  of  the  different  methods. 

In  the  simulations,  bivariate  Normal  distributions  with  sero  means  and  unit  component 
variances  are  used.  Two  values  of  the  correlation  coefficient,  p  »  0  and  p  *  0.6  are 
considered,  in  each  simulation  the  first  10  observations  ware  complete.  Thereafter,  the 
second  component  of  each  observation  was  treated  as  missing  with  probability  1  -  «  -  3/4, 
independently  for  each  observation.  Since  *  <  ^  this  means  that  we  cannot  expect  the 
optimal  rata  of  consistency  for  recursion  (2).  Extensive  results  are  given  in  an 
unpublished  M.Sc  report  at  the  State  University  of  New  York  at  Albany  by  J-M.  Jiang,  but 
the  results  reported  here  were  obtained  on  a  VAX/780  computer  at  the  University  of 
Wisconsin  in  Madison.  The  estimation  procedures  are  denoted  as  follows. 

I.  Standard  recursion  (2). 

II.  Recursive  procedure  using  (8)  and  (10)  with  regression  imputation  (17). 

III.  As  II  but  incorporating  pseudo-random  residuals  into  the  imputation. 

IV.  Standard  recursion  (1). 

V.  True  maximum  likelihood  from  the  available  data. 

VI.  Recursive  treatment  of  the  complete  data. 

Table  1  gives  root  mean-squared  errors  (RMSt's)  for  the  five  parameters  from  batches 
of  100  simulations,  each  with  total  sample  sise  100. 

The  results  in  Section  (a)  of  the  table  correspond  to  the  use  of  recursions  like  (10) 
for  £,  those  in  Section  (b)  to  (9).  Thus,  VI  in  Section  (b)  corresponds  to  maximum 
likelihood  estimation  on  the  complete  data.  The  natural  superiority  of  Vi(b)  is  clear  from 
the  results.  Other  major  features  of  the  table  are  as  follows. 


FOR  BIVARIATE  NORMAL  PARAMETERS  100  SIMULATIONS,  EACH  OF  100  OBSERVATIONS 


CM 

N 


*•  V0 

**n 


+  \0  to  ® 

r*  m  m  cn 

m  n  cs  »■ 


01  S'  CM  UD  CM 

o  id  *  m  ct 


© 

o 

9 

o 

l*> 

CD 

* 

r- 

2  * 

<n 

01 

CD 

CM 

CM 

CM 

in 

a*  cm 

CM 

CM 

CM 

*■ 

CM 

CM 

CM 

o 

o 

o 

© 

o 

o 

O 

O 

o 

0 

0 

O  O  O  <N  O 

r>.  rv  vo 


CM  CM  CM  ©  CM 
\0  10  «D  h  10 


o  o  o 


(i)  Qualitatively,  the  picture  is  the  seas  for  O  *  0.0  end  p  ”  0.6. 

(ii)  Apert  froa  estimation  of  by  method  II  (regression  imputation)  the  results 

in  (b)  are  better  then  those  in  (a). 

(iii)  Particularly  in  (a),  results  in  XV  (recursion  (1))  are  better  than  those  in 
I-III  ( recursion  ( 2 ) ) . 

(IV)  Method  IV  in  (b)  does  almost  as  wsll  as  msthod  V,  the  actual  maximum 
likelihood. 

(V)  Method  III  (regression  imputation  plus  residual)  is  the  least  efficient  of  the 

variants  of  recursion  (1)  for  estimating  u2  but,  as  expected,  it  is  better  than  method  II 

tor  the  estimation  of  This  illustratss  ths  fact  that  method  XI  should  underestimate 

22 

I22  on  average,  according  to  previous  remarks.  To  check  this  the  frequency  of 
underestimation  of  Z^  was  obtained  for  each  method,  as  reported  in  Table  2.  The  results 
point  to  the  strung  negative  bias  in  method  II. 

The  whole  exercise  was  carried  out  also  for  samples  of  sise  SO  with  qualitatively  the 

same  results  as  above. 

Me  have  already  pointed  out  the  criticism  that  the  results  for  the  recursive  methods 
are  order-dependent,  an  obvious  disadvantage  for  the  treatment  of  a  finite  data-set.  A 
very  limited  assessment  of  the  order  effect  was  obtained  by  comparing  the  results  gathered 
from  one  ordering  of  the  data  and  its  reverse.  (Consideration  of  all  possible  orderings 
would  be  out  of  the  question  although  a  more  ambitious  project  would  be  to  compare  a 
moderately  large  number  of  "random"  orderings.)  Table  3  gives,  from  the  100  simulatione, 
the  root  mean  equared  difference  in  parameter  estimates  obtainsd  by  the  two  orderings  for 
methods  I-IV.  For  some  of  the  parameter  estimates,  as  indicated,  the  ordering  does  not 
affect  the  value.  The  most  variation  occurs  with  methods  I,  II  and  III  particularly  in  the 
estimation  of  Z^.  The  results  favor  section  (b)  of  the  table  very  slightly. 

Further  numerical  investigation  is  recorded  in  the  M.Sc  report  of  J-M.  Jiang.  In 
particular  he  investigated  the  effect  of  a  double-pass  through  the  data.  He  considered  a 
pass  through  (forwards)  followed  by  another  pass  with  the  reverse  ordering  (backwards).  He 
also  considered  the  corresponding  backwards-forwards  double-pass.  Among  the  conclusions 
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«. 


ware  tha  following,  both  of  which  support  intuition. 

(i).  Doubla-passaa  produced  aaallar  RMSK's  than  single  passes. 

<ii).  The  differences  between  the  forwards- backwards  double-pass  and  the  backwarda- 
forwards  double-pass  were  small  and  even  more  so  than  the  effects  exemplified  in  Table  3 
for  different  single-pass  procedures. 

Jiang  also  considered  the  performance  of  these  recursive  methods  for  the  simp ler 
problem  of  estimating  the  mixing  weight  in  a  mixture  of  two  known  densities.  He  found,  in 
particular,  that  there  the  "order  effect"  seemed  less  substantial  than  the  among- 
replications  effect.  Comparison  of  Tables  1  and  3  suggests  that  this  may  not  be  the  case 
in  our  example,  particular  in  tha  estimation  of  . 
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6.  DISCUSSION 


Although  this  paper  ha  a  concentrated  on  missing-value  problems,  the  general  procedures 
defined  by  (1)  and  (2)  could  be  applied  to  many  other  incomplete  data  problems.  Some  were 
looked  at  by  Titterington  <1982)  and  other  illustrations  could  be  drawn  from  Dempster, 
laird  and  Rubin  (1977).  in  missing  value  problems,  recursion  (1)  is  comparatively  easy  to 
use  and  particularly  so  in  the  illustration  of  Section  5.  It  seeau  likely  that,  in  spite 
of  their  inferior  asymptotic  performance,  recursion  (2)  and  the  imputation-based  versions 
thereof  will  be  appealing  for  their  comparative  simplicity  in  application.  As  far  as 
Imputation  is  concerned.  Section  S  illustrates  once  more  the  disadvantages  that  can  arise 
from  amen- imputation  as  compared  with  pseudo-random  imputation,  a  point  stressed  by 
Sedransk  and  Titterington  (1980).  They  also  describe  methods,  such  as  the  hot-deck,  for 
dealing  with  incomplete  data  from  sample  surveys.  It  is  hoped  that  this  is  one  area  in 
particular  where  these  recursive  procedures  will  be  very  useful. 
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